1 Learning objectives

  1. Add external data coordinates within the scripts using the tribble() function.

  2. Add text and labels to ggplot maps using the {ggsflabel} package.

  3. Add color palettes to spatial data using the {colorspace} package.

  4. Add arrow and scale annotations in ggplot maps using the {ggspatial} package.

2 Prerequisites

This lesson requires the following packages:

if(!require('pacman')) install.packages('pacman')

pacman::p_load(tidyverse,
               colorspace,
               ggspatial,
               janitor,
               ggplot2,
               spData,
               units,
               sf)

pacman::p_load_gh("yutannihilation/ggsflabel",
                  "afrimapr/afrihealthsites",
                  "afrimapr/afrilearndata",
                  "wmgeolab/rgeoboundaries")

3 Introduction

Until now, we have learnt general concepts about geospatial visualization in independent lessons.

The modular approach of ggplot2 allows to successively add all of them in different layers. For instance, study sites or administrative delineations, and a visual representation of their characteristics in a single map, using useful color palettes.

These enriched thematic maps also require to contain text and labels referring to specific places or regions, and important map elements like scale bars and a north arrow, as will be illustrated in this part.

Figure 1. Ggplot map with multiple layers.

4 Build an informative map

We will create a map to figure out how are Hospitals distributed in the Western Province of Sierra Leone. For this, we are going to simulate the location of two Hospital manually collected in the field, and also retrieve real Hospital information from a public repository.

Add field sites with tribble()

We start by defining two Hospital sites (point data), according to their longitude and latitude, collected by GPS devices in the field, using the tribble() function from the {tibble} package:

sites <- tribble(~gps_name, ~gps_longitude, ~gps_latitude,
                 "site A", -13.1491617, 8.1920813,
                 "site B", -13.1066807, 8.2180983
                 )

sites
## # A tibble: 2 × 3
##   gps_name gps_longitude gps_latitude
##   <chr>            <dbl>        <dbl>
## 1 site A           -13.1         8.19
## 2 site B           -13.1         8.22

Create a tibble object with three columns (gps_name, gps_longitude, gps_latitude) with the GPS location of one "household" with longitude -13.1856942 and latitude 8.2851963. Use the tribble() function.

tribble is better used for a minimum amount of observations. For larger amounts you may prefer to read your data from a csv or excel file.

These sites belong to Sierra Leone:

sierra_leone <- geoboundaries(country = "Sierra Leone", adm_lvl = 1)

To add these geographic coordinates to a country map, we need to use the power of sf.

Converting the data frame to a sf object allows to rely on sf to handle on the fly the coordinate system (both projection and extent), which can be very useful if the two objects (here sierra_leone, and sites) are not in the same projection.

To achieve this, the projection (here WGS84, which is the CRS code #4326) has to be a priori defined in the sf object with the st_as_sf() function (for more information: see previous lessons) :

sites_sf <- sites %>% 
  st_as_sf(coords = c("gps_longitude","gps_latitude"),
           crs = 4326)

Now we can make a map with sierra_leone and sites_sf objects, and add the type of data (gps_name) in the legend:

ggplot() +
  geom_sf(data = sierra_leone) +
  geom_sf(data = sites_sf, mapping = aes(color = gps_name))

We can use coord_sf() after all geom_sf() calls, to zoom in to our area of interest inside Sierra Leone:

ggplot() +
  # geometries
  geom_sf(data = sierra_leone) +
  geom_sf(data = sites_sf, mapping = aes(color = gps_name)) +
  # map extent
  coord_sf(xlim = c(-13.5,-12.7), ylim = c(8.0,8.7))

As such, we can adjust all characteristics of points (e.g. color of the outline and the filling, shape, size, etc.), for each geom_sf() layer. In this example, we set the two points as filled diamonds (shape = 18) with a bigger size (size = 2):

ggplot() +
  # geometries
  geom_sf(data = sierra_leone) +
  geom_sf(data = sites_sf, mapping = aes(color = gps_name),
          shape = 18, size = 4) +
  # map extent
  coord_sf(xlim = c(-13.5,-12.7), ylim = c(8.0,8.7))

4.1 Add province labels with geom_sf_label_repel()

It would be informative to add finer administrative information on top of the previous map, starting with administrative borders (sf object: polygon data) and their names. The package ggsflabel provides functions to add labels to sf objects (points and polygons) in maps.

Province names are part of this data, as the shapeName variable.

sierra_leone %>% 
  ggplot() +
  geom_sf() +
  geom_sf_label_repel(aes(label=shapeName))

The zimbabwe_adm1 object contains the boundaries of all the provinces in Zimbabwe.

zimbabwe_adm1 <- geoboundaries(country = "Zimbabwe", adm_lvl = 1)

Create a map of Zimbabwe with labels for the name of each of its provinces.

To continue building the complex map introduced at the beginning of the lesson, province data is directly plotted as an additional sf layer using geom_sf(). In addition, province names will be added using geom_sf_label_repel(), as well as the label (from shapeName), and a relatively big font size.

ggplot() +
  # geometries
  geom_sf(data = sierra_leone) +
  geom_sf(data = sites_sf, mapping = aes(color = gps_name),
          shape = 18, size = 2) +
  # labels
  geom_sf_label_repel(data = sierra_leone, mapping = aes(label=shapeName)) +
  # map extent
  coord_sf(xlim = c(-13.5,-12.7), ylim = c(8.0,8.7))

We can drop the province name of “Eastern”. For this, we can use filter() from the {dplyr} package:

ggplot() +
  # geometries
  geom_sf(data = sierra_leone) +
  geom_sf(data = sites_sf, mapping = aes(color = gps_name),
          shape = 18, size = 2) +
  # label
  geom_sf_label_repel(data = sierra_leone %>% filter(shapeName!="Eastern"),
                      mapping = aes(label=shapeName)) +
  # map extent
  coord_sf(xlim = c(-13.5,-12.7), ylim = c(8.0,8.7))

Fill district data with {colorspace}

Districts (polygon data) can be retrieved from local shapefile data. This time, only districts from Western province are retained:

sierra_leone_west <- 
  sf::read_sf(dsn = here::here("ch06_basic_geospatial_viz",
                               "data","gis","shp",
                               "sle_adm3.shp"),
              quiet = TRUE) %>% 
  filter(admin1Name=="Western")

ggplot(data = sierra_leone_west) +
  geom_sf()

This time, for all the districts from the province retained, we compute their area using st_area() from the package sf:

sierra_leone_shp <- sierra_leone_west %>% 
  # calculate area of each polygon
  mutate(area_m2 = st_area(.)) %>%
  # convert m2 to km2 
  mutate(area_km2 = units::set_units(.$area_m2,km^2)) %>% 
  # transform the variable to numeric
  mutate(area_km2 = as.numeric(area_km2))
  
sierra_leone_shp %>% 
  select(area_km2)
## Simple feature collection with 12 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -13.29894 ymin: 8.094272 xmax: -12.91333 ymax: 8.499809
## Geodetic CRS:  WGS 84
## # A tibble: 12 × 2
##    area_km2                                                                   geometry
##       <dbl>                                                         <MULTIPOLYGON [°]>
##  1  167.    (((-13.02082 8.381989, -13.0188 8.377952, -13.01781 8.377952, -13.01603 8…
##  2   38.9   (((-13.21496 8.474341, -13.21479 8.474289, -13.21465 8.474296, -13.21455 …
##  3  167.    (((-13.12215 8.413156, -13.12155 8.413105, -13.12013 8.413218, -13.11941 …
##  4  243.    (((-13.24441 8.095223, -13.24574 8.094272, -13.24742 8.094518, -13.2491 8…
##  5    2.30  (((-13.22646 8.489716, -13.22648 8.48955, -13.22644 8.489513, -13.22663 8…
##  6    1.75  (((-13.2129 8.494033, -13.21076 8.494026, -13.21013 8.494041, -13.2096 8.…
##  7    1.83  (((-13.22653 8.491883, -13.22647 8.491853, -13.22642 8.49186, -13.22633 8…
##  8    0.796 (((-13.23154 8.491768, -13.23141 8.491566, -13.23144 8.49146, -13.23131 8…
##  9   20.8   (((-13.28529 8.497354, -13.28456 8.496497, -13.28403 8.49621, -13.28338 8…
## 10    2.23  (((-13.24677 8.493453, -13.24669 8.493285, -13.2464 8.493132, -13.24627 8…
## 11    6.69  (((-13.25698 8.485518, -13.25685 8.485501, -13.25668 8.485505, -13.25657 …
## 12   37.9   (((-13.20465 8.485758, -13.20461 8.485698, -13.20449 8.485757, -13.20431 …

We can now fill in the districts using their area to visually identify the largest counties. For this, we use the {colorspace} package with some transparency. In this case, area_km2 is a continuous variable, with a sequential scale type:

sierra_leone_shp %>% 
  ggplot() +
  # geometry
  geom_sf(aes(fill = area_km2)) +
  # aesthetic
  colorspace::scale_fill_continuous_sequential(palette="Reds 3", alpha = 0.8)

The scales are called via the scheme

scale_<aesthetic>_<datatype>_<colorscale>()

where:

  • <aesthetic> is the name of the aesthetic (fill, color, colour).
  • <datatype> is the type of the variable plotted (discrete, continuous, binned).
  • <colorscale> sets the type of the color scale used (qualitative, sequential, diverging, divergingx).

This is a ggplot map with the afriairports object:

afriairports %>% 
  ggplot() +
  geom_sf(aes(color = elevation_ft))

Paste this code to your answer and:

Update the color aesthetic of the continuous variable elevation_ft of this map to a diverging scale, with a midpoint (mid) in 5000.

A Sequential colors scale indicate:

  • Which values are larger or smaller than which other ones, and
  • How distant two specific values are from each other. This implies that the color scale needs to be perceived to vary uniformly across its entire range.

A Diverging color scale allows to:

  • visualize the deviation of data values in one of two directions relative to a neutral midpoint, usually is represented by a light color,
  • For example, a dataset containing both positive and negative numbers, and show how far in either direction it deviates from zero.

You can use {colorspace} package to:

Add a Scale bar and North arrow with {ggspatial}

We introduce here the package {ggspatial}, which provides easy-to-use functions to create a scale bar and north arrow on a ggplot map:

  • annotation_north_arrow() allows to add the north symbol and
  • annotation_scale() a scale bar.
sierra_leone_shp %>% 
  ggplot() +
  # geometry
  geom_sf(aes(fill = area_km2)) +
  # aesthetic
  colorspace::scale_fill_continuous_sequential(palette="Reds 3", alpha = 0.8) +
  annotation_north_arrow(location="tr") +
  annotation_scale(location="br")

The location of the scale bar and north arrow are by default in the bottom left ("bl") side of the map. They can be specified using the location argument with "tr" for top right, "bl" for bottom left, etc.

This is a ggplot map with the zimbabwe_adm1 object:

zimbabwe_adm1 %>% 
  ggplot() + 
  geom_sf()

Paste this code to your answer and:

Add a Scale bar located in the bottom right of the map, and a North arrow in the top left.

  • The North arrow style in annotation_north_arrow() can also be adjusted using the style argument.

  • Note that scale distance is set to "km" by default in annotation_scale(); you can set it in “m”, “cm”, “mi”, “ft”, or “in”.

Add Health site names with geom_sf_text_repel()

To make a more complete map of the Western province of Sierra Leona, Health facilities (sf object: point data) will be added to the map.

Instead of looking up coordinates manually, the package {afrihealthsites} provides a function afrihealthsites(), which allows to retrieve geographic coordinates of African health facilities from different sources:

You can run the following code to retrieve geographic coordinates of all the health facilities in Sierra Leone available in the who database:

sle_healthsites_all <- afrihealthsites(country = "Sierra Leone", 
                                       datasource='who',
                                       plot = FALSE, 
                                       returnclass = "dataframe") %>% 
  janitor::clean_names()

sle_healthsites_all
## # A tibble: 1,120 × 12
##    country    admin1 facility_name facility_type ownership   lat  long ll_source iso3c
##    <chr>      <chr>  <chr>         <chr>         <chr>     <dbl> <dbl> <chr>     <chr>
##  1 Sierra Le… Easte… Ahmadiyya Mi… Mission Hosp… FBO        7.89 -11.2 GPS       SLE  
##  2 Sierra Le… Easte… Baama Commun… Community He… MoH        8.36 -11.3 GPS       SLE  
##  3 Sierra Le… Easte… Baiama Commu… Community He… MoH        8.52 -11.0 GPS       SLE  
##  4 Sierra Le… Easte… Baiima Commu… Community He… MoH        8.03 -10.8 GPS       SLE  
##  5 Sierra Le… Easte… Baiwala Comm… Community He… MoH        8.00 -10.6 GPS       SLE  
##  6 Sierra Le… Easte… Bambara Kaim… Community He… MoH        8.48 -11.3 GPS       SLE  
##  7 Sierra Le… Easte… Bambara Mate… Maternal & C… MoH        8.32 -11.3 GPS       SLE  
##  8 Sierra Le… Easte… Bambawulo Co… Community He… MoH        8.01 -11.1 GPS       SLE  
##  9 Sierra Le… Easte… Bandajuma Co… Community He… MoH        8.31 -10.8 GPS       SLE  
## 10 Sierra Le… Easte… Bandajuma Kp… Community He… MoH        8.02 -10.9 GPS       SLE  
## # … with 1,110 more rows, and 3 more variables: facility_type_9 <chr>, tier <dbl>,
## #   tier_name <chr>

Access to all the health facilities of Zimbabwe from the healthsites data source using the afrihealthsites() function.

According to the three-tier health delivery classification, the highest tier (Tier 3) includes county hospitals in rural regions and city hospitals in urban regions, which are responsible for most inpatient services as well as teaching and research missions (Wang, 2021).}

We can now keep the Tier 3 health facilities inside the Western province, and convert the data frame with coordinates to sf format:

sle_healthsites_set <- sle_healthsites_all %>%
  filter(admin1=="Western Area") %>% 
  filter(tier==3) %>% 
  st_as_sf(coords = c("long","lat"),
           crs = 4326)

We add hospital locations and names as text on the map with geom_sf_text():

sle_healthsites_set %>% 
  ggplot() +
  # geometry
  geom_sf() +
  # label
  geom_sf_text(mapping = aes(label= facility_name))

This is not really satisfactory, as the names overlap on the points, and they are not easy to read on the grey background. The package {ggsflabel} offers a very flexible approach (inspired in {ggrepel}) to deal with label placement in ggplot maps with sf objects (with geom_sf_text_repel and geom_sf_label_repel), including automated movement of labels in case of overlap.

We use it here to “nudge” the labels away, and connect them to the city locations:

sle_healthsites_set %>% 
  ggplot() +
  # geometry
  geom_sf() +
  # label
  geom_sf_text_repel(mapping = aes(label= facility_name))

This is a ggplot map of all the hospital facilities of Zimbabwe:

afrihealthsites(country = 'Zimbabwe', 
                      datasource='healthsites',
                      plot = FALSE, 
                      returnclass = 'dataframe') %>% 
  filter(amenity == 'hospital') %>% 
  ggplot() +
  geom_sf()

Paste this code to your answer and:

Add their names as text without overlaps to a ggplot map.

Additionally, we can manually set {ggrepel} arguments to improve its output:

  • The size (argument size);
  • The type of font of the text (fontface);
  • The force of repulsion between overlapping text labels (force);
  • The additional padding around the each text label (box.padding).
sle_healthsites_set %>% 
  ggplot() +
  # geometry
  geom_sf() +
  # label
  geom_sf_text_repel(mapping = aes(label= facility_name),
                     size = 3,
                     fontface = "bold",
                     force = 40, 
                     box.padding = 0.6)

5 Final map

For the final map, we put everything together, having a general background map based on the Sierra Leona map, with district delineations, province labels, main hospital names and locations, custom GPS collected field sites, as well as a theme adjusted with axis labels, and a north arrow and scale bar:

ggplot() +
  ## geometries
  # background map
  geom_sf(data = sierra_leone) +
  # district polygons filled by area
  geom_sf(data = sierra_leone_shp, 
          mapping = aes(fill = area_km2)) +
  # hospital points 
  geom_sf(data = sle_healthsites_set) +
  # field site points 
  geom_sf(data = sites_sf, 
          mapping = aes(color = gps_name),
          shape = 18, 
          size = 4) +
  ## labels
  # hospital names with repelled text 
  geom_sf_text_repel(data = sle_healthsites_set,
                     mapping = aes(label = facility_name),
                     size         = 2,
                     fontface     = "bold",
                     box.padding  = 0.6,
                     force        = 0.5,
                     nudge_x      = -0.25,
                     direction    = "y",
                     hjust        = 1,
                     segment.size = 0.2) +
  # province names with repelled labels
  geom_sf_label_repel(data = sierra_leone %>% 
                        filter(shapeName!="Eastern"),
                      mapping = aes(label=shapeName)) +
  ## aesthetics
  # alternative color scale for fill
  colorspace::scale_fill_continuous_sequential(palette="Reds 3", 
                                               alpha = 0.8) +
  # map annotation
  annotation_north_arrow(location="tl") +
  annotation_scale(location="bl") +
  # map extent
  coord_sf(xlim = c(-13.5,-12.7), 
           ylim = c(8.0,8.7)) +
  # ggplot labels
  labs(x = "Longitude",
       y = "Latitude",
       fill = expression(Area~km^2),
       color = "GPS data",
       title = "How are Hospitals distributed in the Western Province of Sierra Leone?")

6 Wrap up

This example fully demonstrates that adding layers on ggplot2 is relatively straightforward, as long as the data is properly stored in an sf object. Adding additional layers like external data, color palettes, point or polygon labels and map annotations would simply follow the same logic, with additional calls after geom_sf() and at the right place in the ggplot2 sequence.

Contributors

The following team members contributed to this lesson:

References

Some material in this lesson was adapted from the following sources:

This work is licensed under the Creative Commons Attribution Share Alike license. Creative Commons License